Determine the Entity Number in Hierarchical Clustering for Web Personal Name Disambiguation
نویسندگان
چکیده
An internet user is often frustrated by the ambiguous names in the web search results when the user is trying to find information about some person. Hierarchical clustering methods are often used to cluster the personal names referred to the same entities. As the correct number of the entities for a given personal name can not be accessed, we are required to determine the cut points in the dendrogram to gain high disambiguation accuracy. In this paper, we explore the appropriate cut points in hierarchical clustering for web personal name disambiguation. We first measure the similarity and density distribution of the search result pages, and then we propose an approach that combines the global distribution features and local features from cut points to explore the appropriate cut points. Finally, we perform experiments on real-world datasets and the results show that our method is effective.
منابع مشابه
CU-COMSEM: Exploring Rich Features for Unsupervised Web Personal Name Disambiguation
The increasing number of web sources is exacerbating the named-entity ambiguity problem. This paper explores the use of various token-based and phrase-based features in unsupervised clustering of web pages containing personal names. From these experiments, we find that the use of rich features can significantly improve the disambiguation performance for web personal names.
متن کاملPRIS at Chinese Language Processing
The more Chinese language materials come out, the more we have to focus on the “same personal name” problem. In our personal name disambiguation system, the hierarchical agglomerative clustering is applied, and named entity is used as feature for document similarity calculation. We propose a two-stage strategy in which the first stage involves word segmentation and named entity recognition (NER...
متن کاملAutomatic Annotation of Ambiguous Personal Names on the Web
Personal name disambiguation is an important task in social network extraction, evaluation and integration of ontologies, information retrieval, cross-document co-reference resolution and word sense disambiguation. We propose an unsupervised method to automatically annotate people with ambiguous names on the web using automatically extracted keywords. Given an ambiguous personal name, first, we...
متن کاملPerson Name Disambiguation on the Web Using Query Expansion
The more important the web search become, the bigger the same name problem in the web search. Proposed solution is forming clusters of people from search results. In this paper, we report our algorithms that disambiguates person names in web search results. Our clustering algorithm is based on hierarchical agglomerative clustering using named entities, compound key words and URLs as features fo...
متن کاملClustering web people search results using fuzzy ants
Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009